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I.  Progress  Summary 

Signal  Innovations  Group  (SIG)  has  been  working  closely  with  the  Naval  Research 
Laboratory  (NRL)  on  development  of  advanced  algorithms  for  detection  and  classifying 
MCM  targets,  with  data  collected  using  the  NRL  sonar  system.  Over  the  current  period  of 
performance  SIG  has  delivered  to  NRL  kernel  matching  pursuits  (KMP)  software,  that 
was  employed  by  NRL  at  the  most  recent  blind  test.  Details  on  the  KMP  algorithm  are 
provided  below.  Additionally,  NRL  has  recently  delivered  data  from  that  blind  test  to 
SIG,  and  SIG  is  currently  processing  this  data.  NRL  will  soon  be  delivering  to  SIG  data 
from  their  most  recent  sea  test,  for  processing  at  SIG. 

As  detailed  below,  the  KMP  algorithm  assume  access  to  a  set  of  separate  training  data, 
for  the  mines  and  clutter  items  of  interest  to  the  environment  under  test.  This  assumption 
was  valid  for  the  blind  test  the  NRL  executed.  However,  in  many  problems  of  practical 
importance,  one  may  not  have  an  appropriate  set  of  training  data,  due  to  changes  in  the 
target  and  clutter  characteristics,  as  well  as  changes  to  the  channel  properties.  To  address 
this  problem  SIG  has  been  examining  in  situ  learning  algorithms,  in  which  one  integrates 
the  sensing  phase  with  classifier  design.  Details  on  this  in  situ  learning  algorithm  (also 
tenned  active  learning)  are  provided  below. 

II.  Kernel  Matching  Pursuits  Details 


We  are  interested  in  learning  sparse  kernel  machines  of  functional  form 

fn(x)  =  'Ewn,Mci’x)  +  wn,o  =  w[<|>„(x)  (1) 

(=i 

where  wn  0  is  the  bias  term,  K{  ■ ,  •)  is  a  kernel  function  measuring  the  similarity  between 
two  data  samples 

^(•)  =  [l,K(Cl,-),^(c2,-),...,^(c„,-)]r  (2) 

with  the  kernel-induced  basis  function  centered  at  c, ,  and 

w„  =K0,w„1,  wn2,  •••,  wnnf  (3) 

are  the  weights  that  combine  the  basis  functions  in  the  summation,  and  the  subscript  n  is 
used  to  denote  the  number  of  basis  functions  being  used,  with  n<N.  In  the  context  of  the 
binary  classification  problem  consider  in  this  section,  a  given  x  is  mapped  to  an  estimated 
ye  {0,1}  as  y  =  U[/(x)-0.5],  where  U(a)  is  a  unit  step  function,  equal  to  one  for 
a  >  0 ,  and  equal  to  zero  otherwise.  The  form  in  (1)  is  the  same  as  used  in  the  SVM  and 
RVM,  although  for  the  SVM  K( c„  x)  must  be  a  Mercer  kernel,  while  for  the  RVM  and 
KMP  this  is  not  necessary. 

The  KMP  implements  a  set  of  functions  of  the  form  in  (1).  Assume  we  are  given  a 
training  set  {x;,y; {f),  ,  where  x,  is  the  7th  input  and  y,  its  expected  output.  The  weighted 
sum  of  squared  errors  between  the  expected  output  and  the  KMP  output  given  in  ( 1 )  is 
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«. = -  »!♦.(*, tf  (4) 

where  [1 ;  is  a  constant  responsible  for  quantifying  the  importance  of  the  zth  training 
sample  (x/?  v,) .  For  example,  1  / (] ;  may  represent  the  variance  of  the  zth  measurement; 

noisy  measurements  will  therefore  be  given  less  importance  when  learning  the  model.  In 
addition,  if  one  has  a  priori  knowledge  that  some  data  x,  are  in  some  sense  “better” 
representative  of  the  system  being  modeled  this  can  be  accounted  for  in  the  parameter  []  , . 

The  unknowns  in  (4)  are  the  centers  c,  of  the  basis  functions  in  <j)(i ,  and  the  weights  are 
represented  by  w„.  The  determination  of  c,  is  addressed  separately  below.  At  the 
moment  we  suppose  c;  and  consequently  <j);j  are  known  and  aim  at  solving  for  wB.  Then 
the  value  of  wn  that  minimizes  (4)  is  found  to  be 

w„  (5) 

iA J 

where  <\>nJ  is  an  abbreviation  of  (x. ) ,  { • }.  =  ^  ( • ) ,  and 

M> = £>.♦.<*.  >♦!<*,) = («) 
is  the  Fisher  information  matrix.  Note  that  for  (6)  to  be  a  BLUE  estimate,  we  have  had  to 
make  no  assumptions  with  regard  to  the  statistics  of  y  conditional  on  x,  other  than  that  of 
a  finite  second  moment. 

An  zzth  order  KMP  employs  n  basis  functions.  The  (z?+l)th  order  KMP  is  inductively 
written  as 

/„+i(x)  =  w^i<|>H+1(x)  (7) 

where 

1  /  \  n  \  \  \~iT  i()  1  /o\ 


<|)fl+i  (•)  =  [l,Af(cl5  •),  K(  c2 ,  K(c„ ,  •),  K(  cll+1  ,•)]'  = 


with  (j)„+1(.)  =  K(cn+l,  •)  a  new  basis  function  centered  at  cB+1.  The  weighted  sum  of 
squared  errors  of  the  (zz+1)  th  order  KMP  is 

e.« =(i/z,7iP<>zr.,p.[>’I  -/„,(*, >]!  (?) 

Assuming  the  basis  functions  in  cj)n+1  are  all  known,  then  from  (6) 

wB+1  =M^1{(3,(j)„+My!.};.  (10) 

minimizes  (14),  where  the  Fisher  information  matrix  Mn+1  is  given  as 

M„+1={p^„+1>r+1,};.  (i6) 

One  may  show  that  wB+1 ,  and  en+l  are  respectively  related  to  w„  and  en  as 


w„+1  = 


W„  +M~l{fiAnA«+u}ib-1\-{PiVnA«+u}iyfn  ~  }/ ] 

-b~l{Pi¥nAn+ +b~x{p,§n+uyi}i 

e„+!  =en  -5e(K,c„+1) 


(11  A) 


where 


(1  IB) 


CDRL  A001 


Contract  No.  N00014-06-C0026 


and 


with  <t>n+u  =K( cB+1,xf). 


(12) 

(13) 


Since  8e(K,cn+1 )  is  dependent  on  the  center  c„+1  of  the  new  basis  function,  we  obtain 
different  values  of  8e(X,cn+1 )  by  selecting  different  cn+1 .  If  we  confine  cn+1  to  be 
selected  from  the  training  data,  we  may  conduct  a  “greedy”  search  in  the  training  set  but 
with  the  previously  selected  data  excluded  to  avoid  repetition.  Formally,  we  have 

c„+i  =  x,„+1  =  argmax^. . .  5 e(K,xk)  (14) 

1  <k<N 


From(14)  8e(K,cn+l)  depends  on  the  functional  form  of  the  kernel  as  well  as  on 

support  samples  c);+1 .  This  allows  us  to  optimize  the  kernel  to  gain  further  error 
reduction.  A  simple  approach  to  take  is  to  first  conduct  a  “greedy”  search  of  c(!+1  in  the 
training  set,  for  a  fixed  kernel,  and  then  fix  cn+1  and  optimize  the  parameters  of  the 
kernel.  For  radial  basis  function  (RBF)  kernels,  the  only  parameter  other  than  cn+1  is  the 
kernel  width,  thus  optimization  of  RBF  kernels  with  cn+1  fixed  is  a  one-dimensional 
search  for  the  kernel  width.  It  is  also  possible  to  optimize  cB+1  and  the  kernel  width 
simultaneously,  but  then  cn+1  is  treated  as  a  free  parameter  and  is  no  longer  confined  to 

the  training  set.  Another  possibility  is  optimization  over  kernels  of  different  functional 
forms,  which  offers  greater  diversity  of  the  basis  functions  available  to  the  KMP. 


III.  In  Situ  Learning 

Assume  that  the  procedure  discussed  above  selects  n  bases  from  the  observed  data  X.  We 
now  require  labeled  data  to  optimize  the  associated  model  weights  w.  In  a  manner 
analogous  to  the  previous  discussion,  we  select  those  x;  e  X  for  which  knowledge  of  the 
associated  labels  yt  would  be  most  informative  in  the  context  of  defining  w.  Those  x,  that 
are  so  selected  define  a  subset  of  signatures  Xs  cz  X ,  and  these  items  are  excavated  to 

yield  the  respective  set  of  labels  Ls,  The  set  of  signatures  and  labels  (X,,  Lv)  are  then  used 
to  define  the  weights  w  in  a  least-squares  sense,  and  the  resulting  model  fix)  is  used  to 
specify  which  of  the  remaining  signatures  x  g  Xs  are  likely  targets  of  interest. 

Assume  that  there  are  J  signatures  in  Xs,  denoted  Xv/.  We  quantify  the  information 
context  in  Xsj  in  the  context  of  estimating  the  model  weights  w,  and  further  ask  which 
x;  y  XA  j  would  be  most  infonnative  if  it  and  its  label  were  added  for  determination  of  w. 

We  have 
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M  (X  ,)  =  Y  a:2(b  d)r. 

The  expressions  (15)  both  employ  an  n-dimensional  basis  set  Bn 


(15) 

X .  In  (15)  the  basis 


set  B„  is  known  and  fixed,  and  we  are  only  summing  over  those  signatures  Xsj  for  which 
knowledge  of  the  associated  labels  is  most  informative  in  defining  the  model  weights  w. 


After  adding  a  new  signature  x(  e  X ,  x,  £  XsJ  ,  we  now  have  Xvjll  and  M„  is  updated 


as 


M„(X  7+1)  =  M„(X  ;)  + 


(16) 


where  ij+i  represents  the  index  of  the  new  signature  selected  for  X.Vi  j+ 1 .  Using  the  matrix 
identity  det(A+FF7)=dct(I+F7A  1  F)det(A),  one  obtains  from  (16) 

qn  (Xs,j+ i )  =  qn  (■ Xs,j  )  + ln  P(A;+1 )  ( 1 7) 


with 


p(x,  )  =  1  +  a  dr .  M-‘(X,  ,U 

'  ^  lj+ \'  lj+\'n’lJ+i  n  ^  s,J/Tn,i 


(18) 


Care  is  needed  with  regard  to  evaluating  the  inverse  of  M,„  since  if  J<n  the  matrix  is  rank 
deficient.  We  have  considered  addressing  this  in  either  of  two  ways.  A  standard  approach 
for  inversion  of  such  matrices  is  to  add  a  small  diagonal  tenn  to  M„,  such  that  its  inverse 
exists.  Alternatively,  by  construction  one  can  assume  that  the  items  associated  with  the 
basis  B„  are  all  associated  with  Xv/,  yielding  a  minimum  of  n  labeled  data  and  therefore 
assuring  that  the  matrix  is  full  rank.  We  have  examined  both  procedures,  and  they  yield 
comparable  results. 


Having  addressed  the  inverse  of  M,„  one  iteratively  maximizes  In  p(x(/  )  to  obtain 

x,/+i  =  arg maxxeX  xeXj ;  ln  p(x)  (19) 

Note  that  to  define  x.  we  again  do  not  require  the  signature  labels.  The  elements  o I'  Xv 

are  selected  iteratively,  in  a  “greedy”  fashion  as  indicated  in  (19),  until  the  infonnation 
gain  is  below  a  prescribed  threshold.  After  J  iterations  we  have  defined  those  signatures 
X,s.  j  for  which  knowledge  of  the  labels  will  best  approximate  the  weights  w.  These  items 
are  excavated,  yielding  the  labels  L,s.  j . 

For  the  assumptions  underlying  the  linear  model  in  (1),  and  assuming  knowledge  of  B„ 
and  (XSyJ,  Ls.  j)  the  optimal  estimation  for  the  weights  w  is  expressed  as 

w  =  [a>T0)]^1a)Ty  (20) 

where  y  represents  the  set  of  labels  determined  via  the  J  excavations 

y  =  {y^y^-^iX 

and  the  Jx(n  + 1)  matrix  <1>  is  defined  as 


(21) 
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0>  = 


where,  for  example,  x.  corresponds  to  yt  . 


(22) 


In  the  classification  stage  we  consider  x<£XsJ  and  compute  fix).  For  a  prescribed 
threshold  t,  x  is  deemed  associated  with  the  + 1  class  if  f(x)>t,  and  associated  with  the  - 
1  class  if  fix)<t,  and  by  varying  the  threshold  t  one  yields  the  receiver  operating 
characteristic  (ROC).  The  key  component  of  the  model  fix)  is  that  it  is  linear  in  the 
weights  w,  which  yields  a  closed-form  procedure  for  selection  of  B„  and  XiS.j,  as  indicated 
in  the  previous  sections. 


